Annotated Swadesh wordlists for the Gumuz group (Komuz family).

Languages included: Sai Gumuz [gum-sai]; Sese Gumuz [gum-ses]; Metemma Gumuz [gum-kok]; Gojjam Gumuz [gum-gjj].

DATA SOURCES

Main source

Bender 1979 = Bender, Lionel M. Gumuz: A Sketch of Grammar and Lexicon. In: Afrika und Übersee, 62, pp. 38-69. // The most important source of comparative lexical data on various Gumuz dialects, collected by the author himself as well as incorporating data from previous studies, both officially published and archival manuscripts. Unfortunately, the data come in the form of restricted lexical lists rather than complete vocabularies, and Bender's own materials seem to suffer from numerous phonetic and semantic inaccuracies.

Additional sources

Ahland 2012 = Ahland, Colleen Anne. A Grammar of Northern and Southern Gumuz. Ph. D. dissertation, University of Oregon. // Grammatical description of two different Gumuz dialects. Contains plenty of illustrative material, although not enough to convert to full-fledged lexicostatistical wordlists.

Innocenti 2010 = Innocenti, Marco. Note elementari di Grammatica Gumuz. Varietà di Mandura. Addis Ababa: Arada Books. // A grammatical description of the Mandura-Metemma variety of Gumuz, together with a comprehensive vocabulary of the language.

Uzar 1989 = Uzar, Henning. Studies in Gumuz. Sese phonology and TMA system. In: Topics in Nilo-Saharan Linguistics. Ed. by M. Lionel Bender. Hamburg: Helmut Buske Verlag, pp. 129-150. // A description of the phonetic system and parts of the verbal grammar of Sese Gumuz, well illustrated by examples.

NOTES

1. General.

Although in most current classifications and catalogs "Gumuz" is usually listed as a single language, lexicostatistical comparison of available data would seem to strongly speak against such a decision, with percentages of matches running as low as circa 70% for some of the "dialects": this indicates that "Gumuz" should rather be considered a small language group, on about the same level as, e.g., Slavic languages.

However, this observation remains inconclusive, since we are still seriously hampered by the lack of comprehensive and reliable data sets for most of these "dialects" / "languages". There is no single, comprehensive, data-supported classification of these dialects, either, although most researchers indicate significant discrepancies between "Northern" and "Southern" varieties of the language.

As a base reference model as well as our primary source of data, we use the information in [Bender 1979], despite the incompleteness of some of its wordlists and the general flaws of Bender's approach to data collection (occasional phonetic inaccuracies and semantic misglossings that are almost always revealed once more detailed and accurate sources on the same lects become available). Since these data are in themselves sufficient to allow for the compilation of four distinct wordlists on four different dialects (one of them contains multiple gaps, though), this makes [Bender 1979] the optimal, if still flawed, base choice. The dialects in question, and their correlation with subsequently published alternate sources, are as follows:

(a) "Sai Gumuz"; data collected by Bender himself. "Sai" is described by him as "the name of a group of 'clans' scattered along the Diddesa valley up to the confluence with the Blue Nile" [Bender 1979: 39]. It more or less corresponds to the "Southern Gumuz" dialect described in [Ahland 2012];

(b) "Sese Gumuz": the basic data, reproduced in [Bender 1979], come from Lee Irwin's unpublished manuscript "on a Sai variety... found at the Diddesa-Nile confluence" [Bender 1979: 39]. Additional material on "Sese" or "Saysay" was collected by Henning Uzar at the village of Sirba (southern bank of the Blue Nile) and published in [Uzar 1989]. It should be noted that, although Bender calls Sese "a variety of Sai" (terminologically understandable, since the term "Se-Se / Say-Say" is itself merely a reduplicated variant of "Sai"), lexicostatistically Sese is quite distant from it; in fact, it shares the least percentage of common vocabulary with all other Gumuz dialects, making it of particular importance for the reconstruction of Proto-Gumuz;

(c) "Metemma Gumuz", also called "Mandura Gumuz" or "Kokit Gumuz" depending on the source. This is the most common and most often referred to variety of Gumuz, spoken around Metemma in the Semien Gondar Zone of Ethiopia. In this case, Bender's original data on "Kokit" (a village near Metemma) were thoroughly cross-checked with Marco Innocenti's data on "Mandura Gumuz" [Innocenti 2010] and C. A. Ahland's data on "Northern Gumuz" in [Ahland 2012];

(d) "Gojjam Gumuz" is the most problematic of the four dialects tackled in [Bender 1979]. The data here were reproduced from Carlo Conti Rossini's I Gunza ed il loro linguaggio (1919-20) where they were themselves reproduced from earlier accounts by A. T. d'Abbadie and other 19th century sources. In other words, data on "Gojjam", in addition to being plagued with lexicostatistical gaps, are old, not very reliable, possibly collating together information from several dialects or subdialects, and not easily identifiable with any of the modern dialectal varieties of Gumuz. Nevertheless, since the data form an integral part of Bender's comparative set, we thought it useful to still include them as a separate wordlist, despite all the problems; at the very least, they have some limited importance to the reconstruction of the wordlist for "Proto-Gumuz".

Pending the appearance of more reliable sources on Sai and Sese Gumuz, for the sake of consistency we include only Bender's data in all four "primary slots", listing the elicited differences between Bender and alternate sources in the Notes section. However, all lexicostatistical calculations between these four dialects have to be taken with a large grain of salt; the only definite conclusion that may be drawn is that individual varieties of Gumuz are lexically different from each other, indicating a certain period of mutually independent development that probably does not exceed 2,000 years, but could have also been significantly shorter than that, depending on how many "false non-cognates" have been included in the lists.

2. Transcription.

Most varieties of Gumuz have fairly complicated phonological systems, not all of which have been ideally described in available sources (not to mention that alternate sources frequently disagree on the phonological structures of individual words). In particular, it is generally agreed that Gumuz features a four-way opposition in the stop system (voiced, voiceless, ejective, implosive) and a three-way opposition in the affricate system (alveolar, alveo-palatal, palatal). For the sake of clarity, below we offer a comparative table of the various transcriptions employed by M. L. Bender (for all four dialects), C. A. Ahland (for "Northern" = "Metemma" Gumuz and "Southern" = "Sai" Gumuz), M. Innocenti (for Metemma), and H. Uzar (for Sese), together with their unified UTS re-coding.

[Bender 1979]	[Ahland 2012]	[Innocenti 2010]	[Uzar 1989]	UTS	Notes
p	p	p	p	p
b	b	b	b	b
--	pʼ	pʼ	pˈ	pʼ
ɓ	ɓ	bʼ	ɓ	ɓ
f	f	f	f	f
		(v)		(v)	Positional variant.
m	m	m	m	m
w	w	w	w	w
t	t	t	t	t
d	d	d	d	d
tˈ	tʼ	tʼ	tˈ	tʼ
ɗ	ɗ	dʼ	ɗ	ɗ
n	n	n	n	n
l	l	l	l	l
r	ɾ	r	r	r
ts	ts	ts	c	c
tsˈ ~ sˈ	tsʼ	sʼ	cˈ	cʼ ~ sʼ
s	s	s	s	s
z	z	z	z	z
c	tʃ	c	č	č
cˈ	tʃʼ	cʼ	čˈ	čʼ
š	ʃ	sh	š	š
ž	ʒ	zh	ž	ž
kʸ	c	ç	c_	ɕ
gʸ	ɟ	j	j_	ʓ
	cʼ	çʼ	c_ˈ	ɕʼ
ɲ	ɲ	ñ	ɲ	ɲ
y	j	y	y	y
k	k	k	k	k
g	g	g	g	g
kˈ	kʼ	kʼ	kˈ	kʼ
x ~ h	χ	h	h	x ~ χ ~ h
ŋ	ŋ	ŋ	ŋ	ŋ
ʔ	ʔ	ʼ	ʔ	ʔ

Additional notes:
1. There seems to be no phonological opposition between cʼ and sʼ. In fact, most authors usually just employ the misleading transcription sʼ (creating the illusion of an "ejective fricative") so as to avoid the print appearance of a complex trigraph tsʼ. However, in [Bender 1979] sʼ and tsʼ sometimes appear for the same dialect, and it is not immediately clear whether this is just the result of a transcriptional inaccuracy or if it reflects some phonetic peculiarities (e. g. "fricativization" in particular contexts). For this reason, we do not completely unify the attested transcriptions in this respect.

2. The phonemes of the č, š series are alternately described in the literature as "palatal" (Bender) or "alveo-palatal" (Uzar, Innocenti, Ahland). Phonemes of the ɕ series are described either as "post-palatal" (Bender) or as "palatal" (Uzar, Innocenti, Ahland). Note, however, that Ahland also places the fricatives ʃ, ʒ (= UTS š, ž) in the "palatal" category, against all other researchers who group them together with the "alveo-palatal" affricates. Note also that Bender has no "post-palatal ejective" in his system; apparently, he does not see any difference between čʼ and ɕʼ, which all other researchers perceive.

3. Uzar is alone among all the others to specifically note the existence of the rare velar implosive ɠ, postulated by him, e. g., in the word ɠàm 'know, learn'. It is not clear whether this observation is to be trusted without additional confirmation.

4. There is no phonological opposition between velar fricative x, uvular fricative χ, or laryngeal fricative h; all three seem to be dialectal variants of the same phoneme. We preserve the original transcriptions, as they may indicate genuinely different places of articulation, but the phoneme is essentially the same. Innocenti has an additional notation ḥ (= UTS ħ) for some of the words (e. g. 'black'); it is extremely doubtful, however, that this represents anything other than yet another positional variant (and there are no traces of any additional laryngeal phonemes in Ahland's superior description of Gumuz phonetics/phonology).

5. The glottal stop has definitive phonological status in word-medial position; less certain is its phonological status in word-initial position, but some researchers (e. g. Uzar) seem to think that this is the case. We preserve the notation ʔ in all cases where it is specifically marked in any of the sources, both in word-medial and word-initial position.

Vowels. Most researchers tend to agree upon a five-vowel phonological system (a, e, i, o, u) for all Gumuz dialects; however, allophonic variation is frequently attested and graphically marked (e. g. e ~ ɛ, o ~ ɔ, u ~ ʋ, i ~ ı, a ~ ʌ ~ ǝ). The "schwa" (ǝ) in particular can have "near-phonemic" status in some of the sources, although it is ultimately a positionally neutralized (unstressed?) variant of several vowels. To avoid unnecessary complications, we preserve all the original phonetic notation without any additional recoding.

Vowel length is phonologically distinctive in Gumuz, although marked rather inconsistently in different sources.

Prosody. Gumuz has alternately been described as a non-tonal language (Bender), a language with two tonal levels, high and low (Ahland), or with three tonal levels, including mid (Uzar, Innocenti). It seems that Ahland's explanation of the "mid" level as secondary (the result of downstep in complex structures) is correct. However, for the sake of simplicity we retain all the original tonal markers the way they are indicated in individual sources.

Database compiled and annotated by: G. Starostin (last update: November 2015).